A Microbenchmark Suite for OpenMP Tasks
نویسندگان
چکیده
We present a set of extensions to an existing microbenchmark suite for OpenMP. The new benchmarks measure the overhead of the task construct introduced in the OpenMP 3.0 standard, and associated task synchronisation constructs. We present the results from a variety of compilers and hardware platforms, which demonstrate some significant differences in performance between different OpenMP implementations.
منابع مشابه
A Microbenchmark Study of OpenMP Overheads under Nested Parallelism
In this work we present a microbenchmark methodology for assessing the overheads associated with nested parallelism in OpenMP. Our techniques are based on extensions to the well known EPCC microbenchmark suite that allow measuring the overheads of OpenMP constructs when they are effected in inner levels of parallelism. The methodology is simple but powerful enough and has enabled us to gain int...
متن کاملEvaluating OpenMP on Chip MultiThreading Platforms
Recent computer architectures provide new kinds of on-chip parallelism, including support for multithreading. This trend toward hardware support for multithreading is expected to continue for PC, workstation and high-end architectures. Given the need to find sequences of independent instructions, and the difficulty of achieving this via compiler technology alone, OpenMP could become an excellen...
متن کاملEvaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite
The recent introduction of task dependencies in the OpenMP specification provides new ways of synchronizing tasks. Application programmers can now describe the data a task will read as input and write as output, letting the runtime system resolve fine-grain dependencies between tasks to decide which task should execute next. Such an approach should scale better than the excessive global synchro...
متن کاملSpeeding Up OpenMP Tasking
In this work we present a highly efficient implementation of OpenMP tasks. It is based on a runtime infrastructure architected for data locality, a crucial prerequisite for exploiting the NUMA nature of modern multicore multiprocessors. In addition, we employ fast work-stealing structures, based on a novel, efficient and fair blocking algorithm. Synthetic benchmarks show up to a 6-fold increase...
متن کاملMicro-benchmarks for Cluster OpenMP Implementations: Memory Consistency Costs
The OpenMP memory model allows for a temporary view of shared memory that only needs to be made consistent when barrier or flush directives, including those that are implicit, are encountered. While this relaxed memory consistency model is key to developing cluster OpenMP implementations, it means that the memory performance of any given implementation is greatly affected by which memory is use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012